knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE,
    fig.height = 8,
    fig.width = 10,
    fig.align = "center"
)
library(tidyverse)
library(stringr)
library(janitor)
library(here)
library(ggrepel)
library(readr)
library(dplyr)
library(gt)
library(plotly)
library(forcats)
library(scales)
library(tidymodels)

Background

The relationship between mental health and education level has been studied extensively, with a number of theories emerging on how the two interact. Correlative trends have been observed in PhD students, possibly owing to increased stress (Bergvall et al., 2025). While other studies have shown a significantly reduced incidence of mental illness in educated individuals, hypothesising a protective effect (Maguire et al., 2017).

This report aims to explore the relationship between antidepressant prescriptions and education level across Scotland, using Public Health Scotland prescribing data, and data from Scotland’s most recent census in 2022. The prescribing data will be taken from January to August 2025 and a monthly average calculated, minimising any recording of repeat prescriptions following the NHS’ prescribing recommendations indicating an optimal 28-day repeat prescription duration (NHS Business Services Authority, 2020; Hertfordshire & West Essex ICB, n.d.).

Generative AI use: none.

Part 1: Exploring prescribing rates

Prescribing data can be found here: https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community

files <- list.files(here("data","consecutive_data"), pattern = "csv", full.names = TRUE) # 8 months of prescribing data through 2025
months <- seq(as.Date("2025-01-01"), as.Date("2025-08-01"), by = "month") # labelling file months
all_prescribing <- map2_df(files, months, ~ read_csv(.x, 
                                                     col_types = cols(DMDCode = col_character(), # ensure DMD codes are all characters, as some are read as numbers 
                                                                      .default = col_guess()), show_col_types = FALSE) %>% 
                             mutate(month = .y)) %>% 
  clean_names() %>% 
  select(hbt, gp_practice, bnf_item_code, number_of_paid_items, month)

First, the prescribing data has to be filtered to show just the antidepressant prescriptions. This is made possible by the inclusion of BNF item codes in the Public Health Scotland prescribing data set; by identifying the relevant code for antidepressants we can filter the data set down to just the drugs we want to look at.

In order to find this number, the BNF code information provided by the NHS Business Services Authority can be filtered by the section name, then average monthly prescribing numbers for each health board can be calculated.

BNF information is available at: https://opendata.nhsbsa.net/dataset/bnf-code-information-current-year

BNF_info <- read_csv(here("data/bnf_code_current_202506_version_88.csv")) %>% 
  filter(BNF_SECTION == "Antidepressant drugs")  # filter the data to show just the antidepressants
invisible(unique(BNF_info$BNF_SECTION_CODE))  # 0403 is the code we use to focus our prescribing data

antidepressants <- all_prescribing %>% 
  filter(str_starts(bnf_item_code, "0403"), hbt != "SB0806") # remove special health board as we filter for antidepressants

antidepressants_avg <- antidepressants %>% 
  group_by(hbt, month) %>% 
  summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>%  # summing the number of monthly antidepressant prescriptions for each health board
  group_by(hbt) %>% 
  summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # prescription averages for each health board across the 8 months

Next, some names for the health boards will be added to the data to allow for clearer visualisation down the line. I have also created a table with the numbers for the highest level of education (columns) achieved by healthboard (rows), using the flexible table builder on the Scottish census website. Using the health board names, this can be joined with the data for antidepressants.

Health board names are available at: https://www.opendata.nhs.scot/dataset/geography-codes-and-labels/resource/652ff726-e676-4a20-abda-435b98dd7bdc

Census table builder is available at: https://www.scotlandscensus.gov.uk/search-the-census#/

hb_names <- read_csv(here("data/hb_names.csv")) %>% 
  clean_names() %>% 
  select(c(hb, hb_name)) %>% 
  mutate(hb_name = gsub("^NHS ", "", hb_name)) # removes NHS prefix for each health board to match the naming conventions in the census data

data_antidepressants <- inner_join(hb_names, antidepressants_avg, by = c("hb" = "hbt")) 
invisible(anti_join(hb_names, antidepressants_avg, by = c("hb" = "hbt"))) # 4 codes dropped from hb_names, explained by  boundary changes as mentioned on the Public Health Scotland site
  
# we have to load the census data as raw text first to extract just the required information
raw_lines <- read_lines(here("data/education_level_healthboard.csv"))
info <- raw_lines[10:25] # keep only the headers and the data (lines 10–25), also excluding the extra total row we don't need
info <- info[-2] # remove the extra header line

census_data22 <- read_csv(
  I(info), # character vector as a connection
  col_names = TRUE, show_col_types = FALSE) %>%
  clean_names() %>%
  rename(hb = highest_level_of_qualification,
         NA_or_under_16 = not_applicable_aged_less_than_16,
         none = no_qualifications,
         lower_school = lower_school_qualifications,
         upper_school = upper_school_qualifications,
         further_education_sub_degree = further_education_and_sub_degree_higher_education_qualifications_incl_hnc_hn_ds,
         degree = degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications,
         apprenticeship = apprenticeship_qualifications) %>% 
  remove_empty("cols") 
education_antidepressants <- inner_join(data_antidepressants, census_data22, by = c("hb_name" = "hb"))

# we can use the total column (health board population) from the census data to convert the flat numbers into proportions
education_antidepressants <- education_antidepressants %>% 
  mutate(apc = mean_items / total, degree_pct = degree/total*100) # monthly prescriptions per capita and percentage of degree holders in each healthboard(for sorting in the graph + later analysis)

education_antidepressants %>% 
  select(hb_name, mean_items, total, apc) %>% 
  arrange(desc(apc)) %>% 
  gt(
    rowname_col   = "hb_name"
  ) %>% 
  tab_stubhead(label = "Health board") %>% 
  cols_label(
    mean_items = "Average monthly items",
    total      = "Population",
    apc        = "Prescriptions per capita"
  ) %>% 
  tab_spanner(
    label = "Prescribing metrics",
    columns = c(mean_items, apc)
  ) %>% 
  fmt_number(
    columns = c(mean_items, apc),
    decimals = 2
  ) %>% 
  fmt_number(
    columns = total,
    decimals = 0,
    use_seps = TRUE
  ) %>% 
  grand_summary_rows(
    columns = c(mean_items, apc),
    fns = list(
      "Mean" = ~ mean(.x, na.rm = TRUE)
    )
  ) %>% 
  tab_source_note(
    source_note = "Data: Public Health Scotland; Prescriptions in the Community"
  ) %>%
  tab_style(
  style = list(
    cell_fill(color = "#e5e5e5"),
    cell_text(weight = "bold")
  ),
  locations = cells_column_labels(everything())
  ) %>%
  tab_style(
    style = cell_text(align = "left"),
    locations = list(
      cells_column_labels(everything()),
      cells_body(columns = everything())
    )
  ) %>% 
  tab_header(
    title = "Antidepressant Prescribing by Scottish Health Board",
    subtitle = "Prescription data from January to August 2025"
  ) %>% 
  tab_options(
    table.width = pct(80),
    heading.align = "center",
    heading.title.font.size = 24,
    heading.subtitle.font.size = 14,
    column_labels.font.weight = "bold"
  )
Antidepressant Prescribing by Scottish Health Board
Prescription data from January to August 2025
Health board
Prescribing metrics
Population
Average monthly items Prescriptions per capita
Dumfries and Galloway 24,197.25 0.17 145,895
Ayrshire and Arran 55,730.25 0.15 365,256
Lanarkshire 99,087.38 0.15 668,027
Greater Glasgow and Clyde 166,587.62 0.14 1,177,213
Forth Valley 41,738.50 0.14 302,784
Western Isles 3,587.50 0.14 26,140
Fife 49,578.62 0.13 371,781
Borders 15,569.50 0.13 116,821
Shetland 2,997.12 0.13 22,986
Tayside 51,971.12 0.13 413,992
Highland 37,482.75 0.12 321,321
Orkney 2,526.00 0.12 21,958
Grampian 62,793.38 0.11 581,040
Lothian 93,846.25 0.10 904,628
Mean 50549.52 0.1321014
Data: Public Health Scotland; Prescriptions in the Community

Results

The table shows the average monthly antidepressant prescriptions per capita from January to August 2025 in all 14 Scottish health boards. Dumfries and Galloway is the clear leader in prescribing rate, with a relatively noticeable gap between it and Ayrshire and Arran following in second, while Lothian has the lowest rate of the lot, but not too far removed from the other boards towards the bottom like Grampian, Orkney and Highland. The total range of prescriptions per capita is almost 0.07, there is a near 70% increase from the lowest prescribing board at 0.1 to the highest at 0.17.

Part 2: Exploring education levels

Now, using the same census data, the education levels of each health board should be looked at relative to each other, to gain an idea of the landscape of this variable.

education_antidepressants_longer <- education_antidepressants %>% 
  pivot_longer(
    c(NA_or_under_16:degree),
    names_to  = "edu_lvl",
    values_to = "edu_lvl_count"
  ) %>% 
  mutate(edu_lvl_pct = edu_lvl_count / total, edu_lvl = factor(edu_lvl, levels = rev(c( # reverse order of census education level progression for better visualisation
        "NA_or_under_16",
        "none",
        "lower_school",
        "upper_school",
        "apprenticeship",
        "further_education_sub_degree",
        "degree"
      ))
    ),
    edu_lvl = fct_recode( # plotly doesn't adopt legend labels set in ggplot so have to label the levels properly here
      edu_lvl,
      "Under 16 / N.A."                = "NA_or_under_16",
      "No qualifications"              = "none",
      "Lower school"                   = "lower_school",
      "Upper school"                   = "upper_school",
      "Apprenticeship"                 = "apprenticeship",
      "Further education (sub-degree)" = "further_education_sub_degree",
      "Degree or above"                = "degree"
    )
  )

graph1 <- education_antidepressants_longer %>% 
  ggplot(aes(
    x    = edu_lvl_pct,
    y    = reorder(hb_name, degree_pct), 
    fill = edu_lvl,
    text = paste(
      "Education level:", edu_lvl,
      "<br>Percentage:", percent(edu_lvl_pct, accuracy = 0.1) # for interactive graph tooltip
    )
  )) +
  geom_col(colour = "grey20", linewidth = 0.2) +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(
    name   = "Highest level of qualification",
    values = c(
      "Under 16 / N.A."                = "#f2e5ff",
      "No qualifications"              = "#d8b7ff",
      "Lower school"                   = "#be89ff",
      "Upper school"                   = "#a35cff",
      "Apprenticeship"                 = "#8930ff",
      "Further education (sub-degree)" = "#7013e6",
      "Degree or above"                = "#4d0099"
    )
  ) +
  labs(
    x = "Percentage of health board population",
    y = "Health board",
    title = "Education level distribution by Scottish health board"
  ) +
  theme(legend.position = "right",
      legend.title = element_text(size = 9),
      legend.text  = element_text(size = 8))
ggplotly(graph1, tooltip = "text")

Results

Shown is a stacked bar chart of the highest level of qualification achievement numbers for each Scottish health board. The standout detail from the graph is Lothian’s degree holder percentage at 34.9, a difference of 6.5% between it and the second highest in Greater Glasgow and Clyde. Dumfries and Galloway also stands out somewhat, sitting towards the bottom in degree holder percentage, with the highest proportion of unqualified and lower school qualifications in the country, also tying with Western Isles for the lowest proportion of under 16s.

Part 3: Investigating the relationship

The relationship between antidepressant prescriptions and education level can now be investigated. This can be done using the percentage of each health board’s population that are degree holders, the largest education level stratum, and running a simple linear regression on this and the prescribing rates.

# set linear regression beforehand for integration into plotly
lm_model <- linear_reg() %>% 
  set_engine('lm') %>% 
  set_mode("regression") %>% 
  fit(apc ~ degree_pct, data = education_antidepressants)
new_data <- tibble(degree_pct = seq(min(education_antidepressants$degree_pct),
                   max(education_antidepressants$degree_pct),
                   length.out = 100))

pred_data <- predict(lm_model, new_data = new_data) %>% 
  bind_cols(new_data)
lm_fit <- lm(apc ~ degree_pct, data = education_antidepressants)
ci_pred <- as_tibble(predict(lm_fit, newdata = new_data, interval = "confidence")) %>% # ci intervals
  bind_cols(new_data)

graph2 <- plot_ly(
  education_antidepressants,
  x = ~degree_pct,
  y = ~apc,
  type = "scatter",
  mode = "markers",
  marker = list(
    color   = "#061A61",
    size    = 15,
    opacity = 0.8
  ),
  alpha = 0.65,
  name = "Health boards",
  text = ~hb_name,
  hoverinfo = "text+x+y"
  ) %>%
  add_ribbons(
    data = ci_pred,
    x = ~degree_pct,
    ymin = ~lwr,
    ymax = ~upr,
    name = "95% CI",
    showlegend = FALSE,
    hoverinfo = "none",
    inherit = FALSE,
    fillcolor = "rgba(6, 26, 97, 0.15)",
    line = list(color = "rgba(0,0,0,0)")
  ) %>%
  add_lines(
    data = pred_data,
    x = ~degree_pct,
    y = ~.pred,
    name = "Regression",
    hoverinfo = "none",
    inherit = FALSE,
    line = list(width = 3, color = "#D92702", dash = "dash")
  ) %>% 
  style(
    traces = 1,
    hovertemplate = paste(
      "<b>%{text}</b><br>",
      "Degree %: %{x:.1f}%<br>",
      "APC: %{y:.3f}<extra></extra>"
    )
  ) %>%
  layout(
    xaxis = list(title = "Degree-level qualification (%)"),
    yaxis = list(title = "Prescriptions per capita"),
    title = list(
    text = "Antidepressant prescribing vs degree-level education<br>
            <sup>Scottish Health Boards, Jan–Aug 2025 — Source: Public Health Scotland</sup>"
    )
  )
graph2

Results

A surface level interpretation of this graph says that there is a negative correlation between the variables: health boards with a higher percentage of degree holders have a lower antidepressant prescribing rate. To take just the extremes, Dumfries and Galloway with the highest prescribing rate sits at the far left of the graph, among the bottom health boards in degree attainment, while Lothian has the lowest prescribing rate and is directly juxtaposed, with the highest degree attainment by a large margin.

However, the confidence intervals are wide, and the majority of the health boards are distributed quite close to each other in education level, while still displaying a sizeable range of prescription rates.

Part 4: Focusing in

To try and get a clearer image of the relationship, the same analysis can be done but with Scottish data zones instead of health boards, focusing in on much smaller localities.

The original prescribing data already had GP practice codes, so the prescriptions can be mapped to data zones using a practice lookup data set from Public Health Scotland. After that, the average monthly prescriptions for each data zones can be calculated, and the census data for education added, again using the flexible table builder with data zones instead of health boards this time.

GP practices and list sizes October 2025 available from: https://www.opendata.nhs.scot/dataset/gp-practice-contact-details-and-list-sizes/resource/47557411-7eda-4278-9d6d-d26ed2ceab5a?inner_span=True

## load in practice details
practice_info <- read_csv(here("data/practice_info.csv")) %>% clean_names()

antidepressants_practice <- antidepressants %>%
  inner_join(practice_info, by = c("gp_practice" = "practice_code")) %>% 
  select(c(gp_practice, number_of_paid_items, month, data_zone)) %>% 
  group_by(data_zone, month) %>% 
  summarise(total_items = sum(number_of_paid_items, na.rm = TRUE), .groups = "drop") %>%  # total prescriptions per month for each data zone
  group_by(data_zone) %>% 
  summarise(mean_items = mean(total_items, na.rm = TRUE), .groups = "drop") # average monthly prescriptions from Jan-Aug 2025 for each data zone

raw_lines_dz <- read_lines(here("data/census_education_dz.csv")) # practice dataset uses 2011 data zones, so i chose 2011 data zones for the census table
dz_rows <- grep('^"S\\d{8}"', raw_lines_dz) # use data zone regex to select the rows with the data we need
core_text_dz <- c(raw_lines_dz[10], raw_lines_dz[dz_rows]) # combine header row with the data
census_dz <- read_csv(
  I(core_text_dz), # character vector fed in
  show_col_types = FALSE) %>%
  clean_names() %>%
  rename(data_zone = highest_level_of_qualification, # rename from corner cell to stubhead
         degree = degree_level_qualifications_or_above_education_qualifications_not_already_mentioned_including_foreign_qualifications) %>%
  select(data_zone, degree, total) # only need these columns for investigation

antidepressants_education_dz <- inner_join(antidepressants_practice, census_dz, by = "data_zone")

invisible(anti_join(antidepressants_practice, census_dz, by = "data_zone")) # checking for mismatches/missing values

antidepressants_education_dz <- antidepressants_education_dz %>% 
  mutate(apc = mean_items/total, # antidepressants per capita
         degree_pct = (degree/total)*100) # degree holder percentage for each data zone

ggplot(antidepressants_education_dz, aes(x = degree_pct, y = apc)) +
  geom_point(alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, linetype = "dashed", linewidth = 1, color = "#1aa3ff") +
  scale_colour_viridis_c(option = "plasma") +
  geom_density_2d_filled(alpha = 0.3) +
  theme_minimal(base_size = 12) +
  theme(
  plot.title = element_text(face = "bold", hjust = 0.5),
  plot.subtitle = element_text(hjust = 0.5),
  legend.position = "right",
  legend.title = element_text(size = 10),
  legend.text  = element_text(size = 8)
  ) +
  labs(
    x = "Degree holders (%)",
    y = "Antidepressant prescriptions per capita",
    title = "Antidepressant prescriptions vs Education level",
    subtitle = "Stratified by Scottish data zone",
    caption = "Data: Public Health Scotland; Prescriptions in the Community, January to August 2025"
  ) +
  guides(fill = "none")

Results

This graph communicates a similar relationship to the data on health boards: that a higher educational attainment is negatively correlated with antidepressant prescribing. Though, with only a weakly negative slope, most of the data zones are distributed around 0-3 antidepressants per capita, with a select few outliers near 10 and even one above 15.

Discussion

From the visualisations, there is meaningful geographic variation in antidepressant prescribing across Scotland at the health board level. Many factors could be contributing to this inequality, including local policy, socioeconomic disparities and baseline population characteristic differences. The pattern illustrated from exploring the education levels, however, is more constant: despite the outlier of Lothian, variation is largely negligible.

The relationship between education and antidepressant prescribing at the health board level, by necessity, follows the same story as the first two figures. Lothian generally leads, while Dumfries and Galloway places worst, and the rest are distributed close to each other between the two, suggesting a negative correlation between the variables. On the third figure however, it becomes apparent that the relationship is not cut and dry, as the relatively invariant education levels clash with the scattered prescribing rates, and what’s left is the majority of the health boards lying around the same degree attainment % with a range of antidepressant prescription rates.

On visualisation of the data zones, the same problem arises, where most data zones have around the same prescribing rate and range across a 40 point difference in degree holders. Despite this, there are some localities placing noticeably higher in prescribing rates, and these few data zones tend towards the lower end in degree attainment, tenuously supporting the negative correlation.

There is a meaningful limitation in the use of data zones mapped to individual GP practices, in that the data zone populations are being used as a model of the population of patients the practice serves. This is inherently flawed as the practices are tied to data zones simply by merit of existing within their boundaries, meaning the real patient population will likely differ from that of the data zone. However, it is reasonable to assume significant overlap between the two populations, and high similarity in their baseline characteristics.

Generally, antidepressant prescribing and education level have a negative correlation, but it’s highly likely these are proxies for larger issues that disproportionately affect specific localities.

Next Steps

Despite the uncertainty of the visualisations produced, there are still valid conclusions to be drawn from this exploration.

It is clear that there are localities with significantly high antidepressant prescribing rates relative to the rest of the country, and it’s unlikely to be a coincidence that these are areas with a relatively low education attainment rate. It would be valuable to further investigate just these outlying data zones, to identify disparities in other characteristics of the area like deficient/harmful infrastructure or local policy, insufficient access to support schemes, or disadvantageous environmental factors among many others. These are all factors that would impact both education attainment and antidepressant use, as the two are inextricably linked and involved in constant, lifelong interactions with personal, communal, and systemic aspects of being.

References

  1. Bergvall, S., Fernström, C., Ranehill, E., & Sandberg, A. (2025). The impact of PhD studies on mental health-a longitudinal population study. Journal of health economics, 104, 103070. Advance online publication. https://doi.org/10.1016/j.jhealeco.2025.103070
  2. Maguire, A., Moriarty, J., O’Reilly, D. et al. (2017). Education as a predictor of antidepressant and anxiolytic medication use after bereavement: a population-based record linkage study. Qual Life Res 26, 1251–1262. https://doi.org/10.1007/s11136-016-1440-1
  3. NHS Business Services Authority. Electronic Repeat Dispensing Handbook. 2020. Available at: https://www.nhsbsa.nhs.uk/sites/default/files/2020-07/Electronic%20Dispensing%20Handbook_Digital_WEB_S-1589995676.pdf [Accessed 21 Nov 2025].
  4. Hertfordshire & West Essex Integrated Care Board. (n.d.) Prescription duration guidance. Available at: https://www.hweclinicalguidance.nhs.uk/prescribing-guidance/prescription-duration-guidance-hwe-icb/ [Accessed 21 Nov 2025].